Improving semistatic compression via phrase-based modeling
نویسندگان
چکیده
منابع مشابه
Improving semistatic compression via phrase-based modeling
In the last years, new semistatic word-based byte-oriented text compressors, such as Tagged Huffman and those based on Dense Codes, have shown that it is possible to perform fast direct search over compressed text and decompression of arbitrary text passages over collections reduced to around 30-35% of their original size. Much of their success is due to the use of words as source symbols and a...
متن کاملImproving Semistatic Compression Via Pair-Based Coding
In the last years, new semistatic word-based byte-oriented compressors, such as Plain and Tagged Huffman and the Dense Codes, have been used to improve the efficiency of text retrieval systems, while reducing the compressed collections to 30–35% of their original size. In this paper, we present a new semistatic compressor, called Pair-Based End-Tagged Dense Code (PETDC). PETDC compresses Englis...
متن کاملImproving Phrase Extraction via MBR Phrase Scoring and Pruning
One of the major reasons for translation errors in phrase-based SMT systems is the incorrect phrases induced from inaccuracy word-aligned parallel data. In this paper, we propose a novel approach that uses the minimum Bayes-risk (MBR) principle to improve the accuracy of phrase extraction. Our approach performs as a four-stage pipeline: first, bilingual phrases are extracted from parallel corpu...
متن کاملImproving Phrase-Based Machine Translation
Current state-of-the-art machine translation systems use a phrase-based scoring model for choosing among candidate translations in a target language, typically English. These models are deemed phrase-based because candidate sentence scores are in large part a product of phrase translation probabilities. These translation probabilities must be learned in some unsupervised manner from a pair of s...
متن کاملImproving Phrase-based Korean-Englis
In this paper, we describe several techniques to improve Korean-English statistical machine translation. We have built a phrase-based statistical machine translation system in a travel domain. On the baseline phrase-based system, several techniques are applied to improve the translation quality. Each technique can be applied or removed easily since the techniques are part of the preprocessing m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Processing & Management
سال: 2011
ISSN: 0306-4573
DOI: 10.1016/j.ipm.2011.01.006